Skip to main content

Incident Report: Persistent Alerts and Transaction Retrieval Issues Across Multiple Networks

Date: 2023-11-06
Time: 10:16 AM (UTC+3)
Duration: 5 hours

Description

Multiple Unable to get txn alerts related to transaction retrieval issues were persistently reappearing even after being manually closed. The issues were primarily related to the inability to get transactions ('txn') on Kava, Moonbeam, and Gnosis-testnet. Additionally, there were numerous alerts from testnets, particularly the BSC testnet and Scroll-Goerli-testnet.

Root Cause

The root cause appears to be related to issues with the RPC endpoints for various testnets, leading to repeated alerts and difficulties in transaction retrieval. Specific issues included large block range fetches and unreliable RPCs for certain testnets.

Impact

The incident resulted in a continuous stream of alerts, causing difficulty in monitoring and managing genuine issues. It affected the efficiency of the event collectors, although the collectors were still keeping up with the chains.

Timeline

  • 10:16 AM: Hau noticed repeated alerts for transaction retrieval issues on multiple networks.
  • 12:41 PM: Vekil performed the initial diagnosis of the issue.
  • 12:53 PM: Bedirhan started to fix the issue.
  • 01:02 PM: Bedirhan resolved the issue.

Lessons Learned

This incident highlights the importance of having reliable RPC endpoints. It also underscores the challenges in dealing with testnets.

Actions Taken

  1. Reduced the block range fetch for the BSC testnet to 500.
  2. Updated upstreams for both BSC-testnet and Scroll-Goerli.
  3. Advised to use public RPC for Scroll-Goerli as it was more reliable.
  4. Created a policy to not auto-suppress and auto-close related alerts.

Incident Reviewer(s)

  • Vekil
  • Bedirhan